Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🗄️ Web Datasets
Common Crawl, Corpus, Training data, Web scraping
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
28869
posts in
64.0
ms
Weblica
: Scalable and
Reproducible
Training Environments for Visual Web Agents
🔧
Agent Tooling
arxiv.org
·
1d
You can now build
directly
on Common
Crawl
from the browser
🕷️
Web Crawling
commoncrawl.org
·
6d
The
Corpus
Problem
🔤
Tokenization
ren.phytertek.com
·
11h
colon-md/retrievalci
: Benchmark hosted RAG services (Vertex / Bedrock / Azure / OpenAI File Search) on your own corpus, in CI, with
cost-capped
lifecycle.
🔎
Meilisearch
github.com
·
17h
·
Hacker News
Web
Feeds
in 2026: A
Survey
📰
RSS Reading Practices
mnot.net
·
1d
·
Lobsters
,
Hacker News
Read the first few comments and
surprised
I didn’t see it, but training data.
Th
...
🕯️
Candle
news.ycombinator.com
·
17h
·
Hacker News
UCT
Protein Intelligence v4 — 15-Module
Analytical
Protein Physics
📇
Vector Indexing
aidoctrine.github.io
·
5d
·
Hacker News
DuckDB Monthly #41: DuckDB internals course,
FTS
walkthrough, and a satellite pipeline with
H3
+ Parquet
🗄️
libSQL
motherduck.com
·
1d
Connecting
the
dots
for accurate AI
📱
Edge AI Optimization
stackoverflow.blog
·
13h
Finally, simple updates that diversify a model’s training data can make a difference. We added
unrelated
tools and system prompts to a simple chat dataset
targe
...
🎛️
Feed Filtering
twitter.macworks.dev
·
4d
Habeas
Class
Actions
⚖️
Civil Liberties
harvardlawreview.org
·
2d
Sentiment is not one signal [
Tommi
Johnsen
]
⭐
Content Scoring
tommijohnsen.substack.com
·
5d
·
Substack
Anthropic says Claude models no longer show
blackmail
behavior
🎭
Claude
kite.kagi.com
·
2d
WorldSpeech
: A Multilingual Speech
Corpus
from Around the World
🕸️
Sparse Vectors
arxiv.org
·
14h
Mythos
'Discovered' a CVE Already in Its Training Data - and That’s Still
Worrying
🔓
Hacking
rival.security
·
3d
·
Lobsters
,
Hacker News
,
Hacker News
What if LLMs are
mostly
crystallized
intelligence?
🏆
LLM Benchmarking
lesswrong.com
·
6d
·
Hacker News
Cheniere
Energy: Buy On Project And Export Expansion Ahead (NYSE:
LNG
)
💵
Dollar Hegemony
seekingalpha.com
·
5d
MicroWorld
: Empowering Multimodal Large Language Models to Bridge the
Microscopic
Domain Gap with Multimodal Attribute Graph
✨
Gemini
arxiv.org
·
14h
Structuring
AI
responses
as HTML
👨💻
AI Coding
dsebastien.net
·
1h
Show HN: Voice gender
classifier
for European voice AI (1MB,
ONNX
, 4ms)
🔤
Tokenization
huggingface.co
·
5h
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help